An Architecture for Efficient News Items Clustering and Retrieval Based on Language Models for a Dynamic Collection of E- Newspapers

نویسنده

  • Deepa Nagalavi
چکیده

Newspaper pages comprises of multiple individual articles divided into multiple columns. The challenging part of this task is to organize and integrate article blocks in the newspaper. This paper proposes a novel approach for Article reconstruction from newspapersincluding an aggregation of multiple sections of article and reading order recovery of each individual article.Thus,the process combines diverse information sources such as geometriclayout, semantic contentsand the sequence of article blocks are also enormouslymined in the model using the clustering algorithm to deal with thecomplex newspaper layouts.The work consisting of different sub tasks such as identification of sections of article, establishment of boundary for individual article, identifying the sequence of blocks. Furthermore subsequently the reading order of English text is used to aggregatethe blocks and retrieve an individual article from newspaper.

برای دانلود متن کامل این مقاله و بیش از 32 میلیون مقاله دیگر ابتدا ثبت نام کنید

ثبت نام

اگر عضو سایت هستید لطفا وارد حساب کاربری خود شوید

منابع مشابه

A Study of Information Extraction Tools for Online English Newspapers (PDF): Comparative Analysis

Information retrieval is the task of retrieving relevant and useful information from e-newspapers. Electronic newspapers are electronic replicas of traditional newspapers. E-newspapers are becoming increasingly popular because of the ease and convenience in accessing them. Newspapers are the source of timely information. These are the documents comprising news items and several independent info...

متن کامل

An Architecture for Efficient Document Clustering and Retrieval on a Dynamic Collection of Newspaper Texts

Clustering of related or similar objects has long been regarded as a potentially useful contribution to helping users navigate an information space such as a document collection. When documents are related by virtue of being about the same or similar topics, then this is often a good indicator that they will be relevant to the same queries and this can be used during the retrieval operation. Ma...

متن کامل

Adaptation and reliability of neighborhood environment walkability scale (NEWS) for Iran: A questionnaire for assessing environmental correlates of physical activity

Background: In spite of the increased range of inactivity and obesity among Iranian adults, insufficient research has been done on environmental factors influencing physical activity. As a result adapting a subjective (self-report) measurement tool for assessment of physical environment in Iran is critical. Accordingly, in this study Neighborhood Environment Walkability Scale (NEWS) was adapted...

متن کامل

Spoken Term Detection for Persian News of Islamic Republic of Iran Broadcasting

Islamic Republic of Iran Broadcasting (IRIB) as one of the biggest broadcasting organizations, produces thousands of hours of media content daily. Accordingly, the IRIBchr('39')s archive is one of the richest archives in Iran containing a huge amount of multimedia data. Monitoring this massive volume of data, and brows and retrieval of this archive is one of the key issues for this broadcasting...

متن کامل

Beyond Shot Retrieval: Searching for Broadcast News Items Using Language Models of Concepts

Current video search systems commonly return video shots as results. We believe that users may better relate to longer, semantic video units and propose a retrieval framework for news story items, which consist of multiple shots. The framework is divided into two parts: (1) A concept based language model which ranks news items with known occurrences of semantic concepts by the probability that ...

متن کامل

ذخیره در منابع من


  با ذخیره ی این منبع در منابع من، دسترسی به آن را برای استفاده های بعدی آسان تر کنید

برای دانلود متن کامل این مقاله و بیش از 32 میلیون مقاله دیگر ابتدا ثبت نام کنید

ثبت نام

اگر عضو سایت هستید لطفا وارد حساب کاربری خود شوید

عنوان ژورنال:

دوره   شماره 

صفحات  -

تاریخ انتشار 2017